Back

Journal of Open Source Software

The Open Journal

Preprints posted in the last 90 days, ranked by how well they match Journal of Open Source Software's content profile, based on 22 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
TRaP: An Open-source, Reproducible Framework for Raman Spectral Preprocessing across Heterogeneous Systems

Zhu, Y.; Lionts, M. M.; Haugen, E.; Walter, A. B.; Voss, T. R.; Grow, G. R.; Liao, R.; McKee, M. E.; Locke, A.; Hiremath, G.; Mahadevan-Jansen, A.; Huo, Y.

2026-03-27 bioengineering 10.64898/2026.03.26.714582 medRxiv
Top 0.1%
17.4%
Show abstract

Raman spectroscopy offers a uniquely rich window into molecular structure and composition, making it a powerful tool across fields ranging from materials science to biology. However, the reproducibility of Raman data analysis remains a fundamental bottleneck. In practice, transforming raw spectra into meaningful results is far from standardized: workflows are often complex, fragmented, and implemented through highly customized, case-specific code. This challenge is compounded by the lack of unified open-source pipelines and the diversity of acquisition systems, each introducing its own file formats, calibration schemes, and correction requirements. Consequently, researchers must frequently rely on manual, ad hoc reconciliation of processing steps. To address this gap, we introduce TRaP (Toolbox for Reproducible Raman Processing), an open-source, GUI-based Python toolkit designed to bring reproducibility, transparency, and portability to Raman spectral analysis. TRaP unifies the entire preprocessing-to-analysis pipeline within a single, coherent framework that operates consistently across heterogeneous instrument platforms (e.g., Cart, Portable, Renishaw, and MANTIS). Central to its design is the concept of fully shareable, declarative workflows: users can encode complete processing pipelines into a single configuration file (e.g., JSON), enabling others to reproduce results instantly without reimplementing code or reverse-engineering undocumented steps. Beyond convenience, TRaP integrates configuration management, X-axis calibration, spectral response correction, interactive processing, and batch execution into a workflow-driven architecture that enforces deterministic, repeatable operations. Every transformation is explicitly recorded, making the full processing history transparent, inspectable, and reproducible. This eliminates ambiguity in how results are generated and ensures that identical protocols can be applied consistently across datasets and experimental contexts. Through representative use cases, we show that TRaP enables seamless, reproducible preprocessing of Raman spectra acquired from diverse platforms within a unified environment. We hope TRaP can empower Raman data processing as a reproducible, shareable, and systematized scientific practice, aligning it with modern standards for computational research. TRaP is released as an open-source software at https://github.com/hrlblab/TRaP

2
Track Hub Quickload Translator: Convert Track Hub or Quickload data for viewing in the UCSC Genome Browser or the Integrated Genome Browser

Freese, N. H.; Raveendran, K.; Sirigineedi, J. S.; Chinta, U. L.; Badzuh, P.; Marne, O.; Shetty, C.; Naylor, I.; Jagarapu, S.; Loraine, A.

2026-03-30 bioinformatics 10.64898/2026.03.26.708838 medRxiv
Top 0.1%
3.7%
Show abstract

SummaryTrack Hub Quickload Translator is a web application that interconverts University of California Santa Cruz (UCSC) Genome Browser track hub and Integrated Genome Browser (IGB) data repository formats by translating the track hub or Quickload configuration files to the other genome browsers required format. This new work enables researchers to work with tens of thousands of published genome assemblies for the first time using either browser. Availability and ImplementationTrack Hub Quickload Translator is implemented using Python 3 and freely available to use at translate.bioviz.org. Integrated Genome Browser is available from BioViz.org. Track Hub Quickload Translator, GenArk Genomes, and the Integrated Genome Browser source code is available from github.org/lorainelab. Contactaloraine@charlotte.edu

3
SaVanache: indexing and visualizing pangenome variation graphs

Mohamed, M.; Durant, E.; Rouard, M.; Muller, C.; Monat, C.; Conte, M.; Sabot, F.

2026-05-08 bioinformatics 10.64898/2026.05.05.722901 medRxiv
Top 0.1%
2.7%
Show abstract

With the rapid increase in genome sequencing and the growing availability of genomic resources, genomics is shifting toward pangenome representations that capture intra- and inter-specific diversity by integrating multiple genomes into a single entity. These pangenomes are increasingly modeled as graphs, encoding complex genomic variations in structures such as de Bruijn or variation graphs. However, while genome browsers provide standard and effective solutions for visualizing single or limited numbers of genomes, equivalent interactive tools for graph-based pangenomes remain limited, particularly for variation graph models. We developed SaVanache, a multi-resolution visualization interface designed to explore pangenome variation graphs at various depths. SaVanache enables the exploration of both global diversity and structural variations (SVs) across genomes relative to a user-defined linear pivot genome. Unlike synteny viewers, SaVanache emphasizes variations by representing SV types through a dedicated set of glyphs, facilitating intuitive one-to-many comparisons. To support smooth exploration, SaVanache preprocesses a Graphical Fragment Assembly (GFA) pangenome file into optimized index and data structures, enabling fast, real-time queries on large pangenome graphs. By combining advanced visualization techniques with efficient data handling, SaVanache provides a robust tool for scientists to analyze and visualize genetic variation within genomes and pangenomes, facilitating the identification of genetic determinants associated with phenotypes of interest and fully exploiting current genomic resources. Author summaryWe introduce SaVanache, an innovative tool that transforms the way we explore genomic resources. SaVanache allows visualization and analysis of pangenome variation graphs (PVGs), which capture genomic diversity by integrating structural variants (SV) and single nucleotide polymorphisms (SNPs) across multiple genomes. Unlike traditional genome browsers limited to a few genomes, SaVanache offers a multi-level, user-friendly interface that allows users to explore from whole pangenomes down to individual structural variants, enabling multidimensional research and development. Using a linear pivot genome as a visual reference, SaVanache simplifies complex PVG structures into intuitive comparisons. It efficiently handles large datasets and speeds up data retrieval through internal parsing. The front-end, built with modern JavaScript frameworks, provides interactive and responsive visualization, while the Python/Django backend supports real-time data updates. Users can detect and classify SVs by comparing syntenic segments between genomes, visualized through a novel glyph-based system that uses shapes and colors to represent complex rearrangements. SaVanache supports seamless zooming from chromosome-wide to nucleotide-level views, interactive diversity scatterplots, dynamic pivot genome switching, and grouping genomes by metadata to explore genotype-phenotype links. In addition, export functions bridge visualization with downstream bioinformatics. Developed with user feedback, SaVanache balances biological relevance and computational efficiency, overcoming PVG complexity to empower users with unprecedented insight into genomic diversity and SVs.

4
OpusTaxa: A Unified Workflow for Taxonomic Profiling, Assembly, and Functional Analysis of Shotgun Metagenomes

Chen, Y.-K.; Harker, C. M.; Pham, C. M.; Grundy, L.; Wardill, H. R.; Roach, M. J.; Ryan, F. J.

2026-04-19 bioinformatics 10.64898/2026.04.15.718825 medRxiv
Top 0.1%
2.5%
Show abstract

Shotgun metagenomics has become a cornerstone of microbiome research, yet the complexity of existing workflows remains a major barrier for life scientists without dedicated bioinformatics support. Manual database setup, detailed sample sheet preparation, and management of software dependencies can make routine analysis difficult and time-consuming. Cross-study comparisons are further hampered by inconsistent processing pipelines, database versions, and profiling strategies, limiting reproducibility and the potential for large-scale meta-analyses. We present OpusTaxa, an open-source Snakemake workflow that provides end-to-end processing of short paired-end shotgun metagenomic data with minimal configuration. Users provide either FASTQ files or Sequence Read Archive accessions; OpusTaxa automatically downloads required databases, performs quality control, removes host reads, and executes taxonomic profiling, metagenome assembly, and functional analysis. All analysis modules can be independently toggled, and per-sample outputs are automatically merged into harmonised, cross-sample tables ready for downstream exploration. Across two public datasets, we demonstrate how OpusTaxa can be used to compare consistency across complementary taxonomic profilers and to estimate microbial load in addition to standard metagenomic workflows. AvailabilityOpusTaxa is freely available at https://github.com/yenkaiC/OpusTaxa. Documentation, test data, and example configurations are included in the repository.

5
BaSiCPy: Scalable and Robust Shading Correction for Optical Microscopy Images

Liu, Y.; Fukai, Y. T.; Cano-Muniz, S.; Perez, V.; Todorov, M.; Ortega, G.; Morello, T.; Loeffler, D.; Paetzold, J.; Xu, X.; Lamm, L.; Ma, N.; Erturk, A.; Schroeder, T.; Boeck, L.; Schapiro, D.; Schaub, N.; Marr, C.; Peng, T.

2026-05-01 bioengineering 10.64898/2026.04.28.721386 medRxiv
Top 0.1%
2.1%
Show abstract

Quantitative fluorescence microscopy is frequently confounded by spatially varying illumination and temporal intensity drift. Although BaSiC is a widely adopted retrospective correction method, it can fail when foreground content is strongly correlated across images--a common regime in time-lapse, tiled and volumetric acquisitions--and its application often requires manual parameter tuning that limits reproducibility and scalability. We introduce BaSiCPy, a foreground-aware implementation of BaSiC that improves illumination profile estimation under correlated foreground structures, provides automatic hyperparameter selection and accelerates large-scale processing through GPU support. BaSiCPy is distributed as an open-source Python package with graphical and programmatic interfaces, facilitating integration into contemporary bioimage analysis workflows.

6
SpotGraphs: Graph-based analysis of spatially resolved transcriptional data in R

Lee, A. J.; Sanin, D. E.

2026-03-16 bioinformatics 10.64898/2026.03.12.711347 medRxiv
Top 0.1%
2.1%
Show abstract

IntroductionCommon spatial transcriptomic analysis pipelines in R focus on pre-processing and visualization, while providing limited and indirect methods to leverage true spatially resolved quantification of transcripts. Often, x,y-coordinates in spatial transcriptomics (ST) data are integrated into analysis via "spatially aware" normalization (Salim et al., 2024), clustering methods (Zhao et al., 2021), or the identification of spatially variable genes (Yan et al., 2025). Though useful, these methods do not provide any opportunity for analysts to adjust or interrogate the underlying graphs that define adjacencies between spots in their data. Here, we present SpotGraphs, a package that allows the user a more direct and flexible option to interact with the x,y-coordinates of their ST data in R through the existing igraph infrastructure (Antonov et al., 2023; Csardi et al., 2025; Csardi & Nepusz, 2006). Similar functionality exists in Python through SquidPys graph API (Palla et al., 2022), and we compare results obtained from both packages, demonstrating similar performance. Additionally, we provide a set of tools that are useful for ST data analysis, including a toolkit to filter low quality spots laying on tissue debris, beyond arbitrary thresholds, edit spot-level adjacencies based on spatial clusters, and identify centers or boundaries of user-defined neighborhoods of interest.

7
Figra: A WebAssembly-based Excel Add-in for publication-quality scientific visualization with ggplot2

Sato, Y.

2026-05-12 bioinformatics 10.64898/2026.05.06.723320 medRxiv
Top 0.1%
1.9%
Show abstract

Data visualization is a critical step in scientific communication. Most researchers rely on subscription-based software for this purpose, which requires ongoing licensing costs. Free alternatives such as R and Python offer publication-quality output but demand programming expertise that many researchers do not possess. Artificial intelligence tools can assist with figure generation but remain frustrating when users wish to fine-tune specific visual parameters to their preference. Meanwhile, Microsoft Excel, the most widely used tool for scientific data storage and management, offers limited visualization capabilities, forcing researchers to transfer their data to external software as an extra step before creating figures. Here we present Figra, a free Excel Office Add-in that eliminates this extra step by enabling publication-quality ggplot2-based figure generation directly within Excel, with simple and direct control over every visual option. Figra leverages WebAssembly technology (webR) to execute R code entirely within the browser, requiring no R installation, no subscription, and no server connection. The add-in supports over 20 chart types spanning distribution plots, grouped comparisons, time-series, scatter plots, and specialized curve-fitting analyses. For applicable chart types, Figra performs automated or manual statistical analysis supporting both paired and unpaired designs across two or more groups. Additionally, Figra exports simplified, executable R code that reproduces the displayed figure, serving as an educational tool for researchers wishing to learn ggplot2. Figra is open-source and freely available at https://h20gg702.github.io/figra-pages/index.html while the source code is provided at https://github.com/h20gg702/Figra.

8
Correlate: A Web Application for Analyzing Gene Sets and Exploring Gene Dependencies Using CRISPR Screen Data

Deolankar, S.; Wermeling, F.

2026-04-04 bioinformatics 10.64898/2026.04.02.716070 medRxiv
Top 0.1%
1.8%
Show abstract

CRISPR screen data provides a valuable resource for understanding gene function and identifying potential drug targets. Here, we present Correlate, a freely accessible web application (https://correlate.cmm.se) that enables exploration of the Cancer Dependency Map (DepMap) CRISPR screen gene effects, hotspot mutations, and translocation/fusion data across more than 1,000 human cancer cell lines. The application supports two main use cases: (i) analysis of user-defined gene sets (e.g. CRISPR screen hits) to identify functionally linked genes based on correlations while providing an overview based on essentiality or user-provided screen statistics; and (ii) exploration of genes of interest in defined biological contexts, such as specific cancer types or mutational backgrounds, to generate hypotheses about gene function and dependencies. Additionally, Correlate supports experimental design by providing rapid overviews of gene essentiality and enabling the identification of cell lines with relevant mutational profiles. In contrast to knowledge-based approaches such as STRING and GSEA, which rely on prior biological annotations and curated interaction networks, Correlate identifies gene connections directly from functional CRISPR screen readouts, offering a complementary and data-driven perspective on gene network analysis. The application runs entirely in the browser, requires no installation or login, and integrates with the Green Listed v2.0 tool family for custom CRISPR screen design. HIGHLIGHTS{blacksquare} Interactive web-based platform for bulk correlation analysis of user-defined gene sets using DepMap CRISPR screen data, requiring no installation or programming expertise. {blacksquare}Identifies functional gene relationships from CRISPR screen readouts rather than curated annotations, offering a data-driven complement to tools such as GSEA and STRING. {blacksquare}Enables contextual exploration of gene dependencies across cancer types and mutational backgrounds, supporting hypothesis generation about gene function and therapeutic targets. {blacksquare}Supports experimental design through gene essentiality overviews, mutation and fusion analysis, and cell line identification, with optional integration of user-provided statistics from CRISPR screens, proteomics, or transcriptomics analyses.

9
gbdraw: a genome diagram generator for microbes and organelles

Kawato, S.

2026-04-09 bioinformatics 10.64898/2026.04.07.716863 medRxiv
Top 0.1%
1.7%
Show abstract

MotivationGenerating graphical diagrams of microbial and organellar genomes is a common and essential task in bioinformatics. Existing tools often present a trade-off; while powerful programming libraries that require coding skills, graphical applications require server processing or local installation with complex dependency. This highlights the need for a tool that offers both programmatic control for batch processing and graphical accessibility for ease of use. ResultsTo fill this gap, I developed gbdraw, a web application that generates circular and linear genome diagrams from self-contained GenBank or DDBJ files or combinations of GFF3 annotation and FASTA sequence files. Its core functions include visualizing annotated features, plotting GC content/skew tracks, and optionally generating pairwise sequence comparisons for comparative genomics. It is available as both a GUI web application and a command-line utility. Unlike existing web-based tools that require data upload to a remote server, gbdraw operates entirely within the users web browser. This serverless architecture ensures that sensitive sequence data never leaves the local machine, providing a secure environment for visualizing unpublished genomic data. Availability and Implementationgbdraw is implemented in Python 3 (version 3.10+) and is freely available under the MIT license. The web app is available at https://gbdraw.app/. Source code and documentation are available at https://github.com/satoshikawato/gbdraw. The local version can be installed from the Bioconda channel using a conda-compatible package manager.

10
MicrobeMS - A MATLAB Toolbox for Microbial Identification Based on Mass Spectrometry

Lasch, P.

2026-05-12 bioinformatics 10.64898/2026.05.08.723807 medRxiv
Top 0.1%
1.6%
Show abstract

1.Over the last two decades, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-ToF MS) has become the standard method for identifying bacteria and has found a wide range of applications, especially in clinical microbiology. The methods high taxonomic resolution, minimal sample preparation, and complete, ready-to-use commercial systems, which include instrumentation, experimental protocols, spectral databases, and identification analysis software, were key factors in the success of MALDI-ToF MS as the standard for identifying microorganisms in routine diagnostic laboratories. However, despite the availability of these commercial solutions, there is also a growing need for efficient, cost-effective, vendor-neutral databases and analysis tools. These tools would enable the compilation of user-defined mass spectral databases and the testing of new analysis methods and algorithms, particularly in an academic context. To this end, MicrobeMS software has been developed to cover all stages of MALDI-ToF MS-based identification analysis. MicrobeMS is an easy-to-use desktop application for analyzing mass spectra from microorganisms and performing tasks related to spectrum database compilation. It includes routines for direct data import and export, biomarker peak searches, management of spectrum metadata, testing of spectrum quality, supervised and unsupervised identification analysis and intuitive result display. MicrobeMS is implemented in MATLAB and is freely available as MATLAB pcode for Windows and Linux, as well as a standalone application. Over the last fifteen years, the software has undergone continuous development and is now used routinely in various settings at the Centre for Biological Threats and Special Pathogens (ZBS) at the Robert Koch Institute (RKI) in Berlin, Germany, for example in supporting spectrum database compilation, to identify special or rare pathogenic bacteria by advanced identification analysis concepts, or to test in silico MALDI-ToF MS databases derived from microbial genomes. In this software publication the versatility and capabilities of MicrobeMS are demonstrated using a test data set from highly pathogenic bacteria (HPB) which has been obtained as part of a published European Union (EU)-funded External Quality Assurance Exercise (EQAE). MicrobeMS and HPB test data can both be downloaded from https://wiki.microbe-ms.com/. The goal of this software publication is twofold: to raise awareness of MicrobeMS within the scientific community and to encourage the testing of the software and custom-developed MALDI-ToF MS databases of the RKI, which are published at the ZENODO data repository (https://doi.org/10.5281/zenodo.7702374).

11
StrucTTY: An Interactive, Terminal-Native Protein Structure Viewer

Jang, L. S.-e.; Cha, S.; Steinegger, M.

2026-03-19 bioinformatics 10.64898/2026.03.17.712308 medRxiv
Top 0.1%
1.5%
Show abstract

Terminal-based workflows are central to large-scale structural biology, particularly in high-performance computing (HPC) environments and SSH sessions. Yet no existing tool enables real-time, interactive visualization of protein backbone structures directly within a text-only terminal. To address this gap, we present StrucTTY, a fully interactive, terminal-native protein structure viewer. StrucTTY is a single self-contained executable that loads mulitple PDB and mmCIF files, normalizes three-dimensional coordinates, and renders protein structures as ASCII graphics. Users can rotate, translate, and zoom in on structures, adjust visualization modes, inspect chain-level features and view secondary structure assignments. The tool supports simultaneous visualization of up to nine protein structures and can directly display structural alignments using Foldseeks output, enabling rapid comparative analysis in headless environments. The source code is available at https://github.com/steineggerlab/StrucTTY. O_TEXTBOXKey MessagesO_LIReal-time, interactive protein structure visualization directly within text-only terminals C_LIO_LIASCII-based, depth-aware rendering of PDB and mmCIF backbone structures C_LIO_LIMulti-structure comparison with direct application of Foldseek alignment transformations C_LIO_LIDesigned for headless workflows on remote servers and HPC systems C_LI C_TEXTBOX

12
Cellfoundry: a GPU-accelerated, multi-physics ABM framework for cellular microenvironment and organoid-scale studies

Borau, C.; Chisholm, R.; Richmond, P.

2026-04-25 bioengineering 10.64898/2026.04.22.720218 medRxiv
Top 0.1%
1.5%
Show abstract

Advanced in vitro systems such as organoids and microfluidic organ-on-a-chip platforms enable physiologically richer experimentation, but their complexity creates large parameter spaces and makes it difficult to disentangle the mechanistic roles of transport, mechanics, and extracellular microstructure. Agent-based modelling provides a natural computational counterpart to these systems by representing heterogeneous cells as discrete entities coupled through local rules and environmental fields. However, realistic microenvironment models often remain limited by scalability, simplified extracellular matrix representations, and the practical difficulty of calibrating large numbers of parameters. Here we present Cellfoundry, a computational framework built on a FLAMEGPU2-based modelling template for simulating complex cellular microenvironments. The framework integrates multiple interacting agent populations, including cells, fibrous networks, and focal adhesions mediating attachment dynamics and traction-force transmission. It combines mechanically resolved cell-cell and cell-matrix interactions with multi-species diffusion fields that propagate biochemical signals through the extracellular environment and regulate processes such as metabolism, migration, and cell-cycle progression. Cellfoundry also supports customizable behaviours across multiple cell types, enabling the study of heterogeneous multicellular systems within a unified computational setting. To support reproducible model development and calibration, the framework includes a fibre-network generation module, automated performance benchmarking workflows, post-processing and reporting utilities, and an Optuna-based Bayesian optimization pipeline with configurable single- and multi-objective targets. Two showcase examples illustrate these capabilities: a migration assay calibrated against fibroblast motility descriptors and a multi-objective organoid growth scenario reproducing target population composition and expansion dynamics and over time. Together, these examples demonstrate how Cellfoundry can be used to build, calibrate, and extend mechanistically interpretable models of coupled biochemical and mechanical dynamics in advanced in vitro systems. HighlightsO_LIHighly versatile, GPU-accelerated agent-based framework for cellular microenvironments C_LIO_LIExplicit fibrous ECM networks with dynamic remodelling and focal adhesion agents C_LIO_LICoupled mechanics and multi-species diffusion regulate cell behaviour in a highly customizable environment C_LIO_LIModular architecture with automated benchmarking and Bayesian parameter optimization C_LI

13
VX: an AI-enabled desktop genome viewer and transcriptome browser with a programmable analysis framework

Shirokikh, N. E.; Cleynen, A.

2026-05-20 bioinformatics 10.64898/2026.05.17.725790 medRxiv
Top 0.1%
1.5%
Show abstract

BackgsroundGenome and transcriptome browsers are central to the interpretation of high-throughput sequencing data, but todays tools assume a human operator at a graphical interface and offer only limited programmability. As large-language-model assistants become routine in bioinformatics [Anthropic, 2024], this creates a bottleneck: agents cannot observe the visual state of the browser or drive it through the same interface as the human user, and analyses remain fragmented across a separate ecosystem of external tools. Transcript-coordinate data, produced by ribosome profiling [Ingolia et al., 2012] and direct RNA sequencing [Garalde et al., 2018], is also awkwardly supported in chromosome-oriented viewers. ResultsWe present VX, a desktop genome and transcriptome viewer written in D, using GTK 3 and OpenGL, that handles genome-scale and transcriptome-scale data in a unified interface. VX exposes its full functionality through an embedded HTTP API on the loopback interface and a Model Context Protocol server of currently thirty-nine tools, so that scripts and LLM agents can load data, navigate, manage tracks, run analyses, and capture figures through the same contract used by the GUI. An integrated analysis framework provides more than fifty analyses and includes signal processing and peak calling, quantification, variant analysis, alignment statistics, interaction and cross-track comparisons, all with an explicit four-level scope hierarchy running from viewport to whole dataset; results are written to disk and, where appropriate, added as new tracks. Additional features include a magnifier popup for base-resolution inspection (Alt+hover), chromosome-alias resolution across UCSC, Ensembl, and NCBI conventions, viewport video recording via an ffmpeg pipe, and INI-based configuration. ConclusionsVX complements existing desktop and web browsers by providing a native agent-control layer, an integrated analysis framework, and first-class transcriptspace handling. The binary is freely available for non-commercial use; the HTTP API and MCP protocol are fully specified in this article, so third-party clients can be written independently of the core implementation.

14
GROQ-seq Enables Cross-site Reproducibility for High-Throughput Measurement of Protein Function

Spinner, A.; Ross, D.; Cortade, D.; Ikonomova, S.; Baranowski, C.; Dhroso, A.; Reider Apel, A.; Sheldon, K.; Duquette, C.; Kelly, P. J.; DeBenedictis, E.; Hudson, C.

2026-04-09 bioengineering 10.64898/2026.04.07.716961 medRxiv
Top 0.1%
1.4%
Show abstract

High-throughput functional assays are increasingly used to generate large-scale protein function datasets for protein engineering and machine learning applications. However, the utility of such datasets depends on the reproducibility of the underlying measurements. Here we report reproducible, quantitative measurements of protein sequence-to-function data at scale across two facilities. We analyze GROQ-seq (Growth-based Quantitative Sequencing) measurements of three bacterial transcription factors. Independent barcode measurements of the same sequence produce highly consistent functional estimates, demonstrating strong biological reproducibility (across all transcription factors the mean Root Mean Square Deviation [RMSD] {approx} 0.53 and mean Spearman {approx} 0.63). We also compared experiments performed at two facilities using a shared protocol, but with differing levels of automation and system integration. We observe strong agreement between measurements taken at the two sites (mean RMSD {approx} 0.41 and mean Spearman {approx} 0.730). Orthogonal tests further support this agreement: a classifier trained to distinguish data by site performs near random (AUC = 0.559), and top-ranking variants show strong statistical overlap between experiments. Together, these results demonstrate that GROQ-seq enables reproducible, scalable measurement of protein function suitable for large aggregated datasets.

15
DigitalPedon: A Novel Digital Twin Framework for Soil Profile Monitoring and Global Soil Data Interoperability

Youssef, A.; Badreldin, N.

2026-05-08 bioengineering 10.64898/2026.05.05.722891 medRxiv
Top 0.1%
1.3%
Show abstract

The Digital Pedon (DP) is an open-source Python framework that represents a soil profile as a continuously updated digital twin, bridging three persistent gaps in soil science: disconnected models and observations, cross-database interoperability, and the inference gap between raw sensor signals and agronomically meaningful variables. Integrating real-time sensor streams, model-based solver chains (Model-Zoo), GLOSIS-compliant ontology mapping, and a novel LLM agentic interface layer enabling natural language soil queries, the DP supports applications spanning precision agriculture, digital soil mapping, and environmental sustainability assessment. Four proof-of-concept experiments confirm automatic profile initialisation fidelity, solver chain consistency, ontology compliance, and user-defined solver extensibility.

16
Calcium transient detection and segmentation with the astronomically motivated algorithm for background estimation and transient segmentation (Astro-BEATS)

Fan, B.; Bilodeau, A.; Beaupre, F.; Wiesner, T.; Gagne, C.; Lavoie-Cardinal, F.; Hlozek, R.

2026-03-17 bioinformatics 10.64898/2026.03.13.711411 medRxiv
Top 0.1%
1.3%
Show abstract

SignificanceFluorescence-based Ca2+-imaging is a powerful tool for studying localized neuronal activity, including miniature Synaptic Calcium Transients, providing real-time insights into synaptic activity. These transients induce only subtle changes in the fluorescence signal, often barely above baseline, which poses a significant challenge for automated synaptic transient detection and segmentation. AimDetecting astronomical transients similarly requires efficient algorithms that will remain robust over a large field of view with varying noise properties. We leverage techniques used in astronomical transient detection for miniature Synaptic Calcium Transient detection in fluorescence microscopy. ApproachWe present Astro-BEATS, an automatic miniature Synaptic Calcium Transient segmentation algorithm that incorporates image estimation and source-finding techniques used in astronomy and designed for Ca2+-imaging videos. Astro-BEATS uses the Rolling Hough Transform filament detector to construct an estimate of the expected (transient-free) fluorescence signal of both the dendritic foreground and the background. Subtracting this baseline signal yields difference images displaying transient signals. We use Density-Based Spatial Clustering of Applications with Noise to find sources clustered in spatial and temporal space. ResultsAstro-BEATS outperforms current threshold-based approaches for synaptic Ca2+ transient detection and segmentation. The produced segmentation masks can be used to train a supervised deep learning algorithm for improved synaptic Ca2+ transient detection in Ca2+-imaging data. The speed of Astro-BEATS and its applicability to previously unseen datasets without re-optimization makes it particularly useful for generating training datasets for deep learning-based approaches. ConclusionAstro-BEATS greatly reduces the time needed for the annotation of synaptic Ca2+ transient and removes the significant overhead of human expert annotation, enabling consistent analysis of new Ca2+-imaging datasets.

17
Verticall: A fast and robust tool for recombination detection in large-scale bacterial genomic datasets

Odih, E. E.; Wick, R. R.; Holt, K. E.

2026-04-24 bioinformatics 10.64898/2026.04.21.719734 medRxiv
Top 0.1%
1.3%
Show abstract

The inference and removal of horizontally acquired genomic regions is a crucial step in phylogenomics analyses for evolutionary studies. Existing tools perform well on clonal lineage-focused datasets on the scale of hundreds of genomes, but are limited in their ability to analyse larger or more diverse datasets. Here we present Verticall, a tool to identify recombinant regions in bacterial assemblies and generate recombination-free phylogenies, which scales to thousands of genomes from clonal to genus-level diversity. Verticall uses a non-parametric approach to assign genomic regions as horizontally or vertically related based on the distribution of pairwise genetic distances between genomes. Recombination-free phylogenetic trees may be inferred by either calculating a pairwise genetic distance matrix from vertical-only regions (distance-tree approach) or by pairwise comparisons of all genomes to a reference and then masking horizontally acquired regions in a pseudo-alignment to the reference (alignment-tree approach). We demonstrate Verticalls performance using four publicly available whole-genome sequence datasets of varying sample sizes (range: 154 - 4,857 genomes) and evolutionary scales (ranging from within-lineage to genus-wide diversity). Across all four datasets, Verticall showed comparable or superior performance to the established tools Gubbins and ClonalFrameML in terms of computational efficiency, plausibility of inferred phylogenetic trees, and recovery of temporal signal for molecular dating. Our results show that Verticall is a useful tool to more efficiently and accurately detect recombination, particularly applied to datasets for which existing tools are limited, including large datasets with hundreds to thousands of genomes and those that span entire species or genera. Verticall is available free and open source at https://github.com/rrwick/Verticall. Impact StatementMany bacterial species can acquire genetic material from external sources and stably incorporate them into their own genomes through homologous recombination. During phylogenomic analyses to investigate outbreaks or for evolutionary studies, a core objective is often to reconstruct the evolutionary history of the studied organisms independent of these horizontally acquired genomic regions. This is particularly desirable when the aim is to construct dated phylogenies, as horizontally acquired variation can interfere with the molecular clock signal on which dating relies. Existing recombination detection programs perform well in certain contexts, but their algorithms are not suitable for datasets with very high diversity or thousands of genomes. We addressed this gap by developing the software package Verticall. We show this approach produces comparable results to existing software for smaller more clonal datasets, but also performs well on datasets that the existing packages cannot handle. Data SummaryVerticall is available free and open source at https://github.com/rrwick/Verticall. We used published whole-genome sequence data deposited in public databases (Pathogenwatch [https://pathogen.watch/]; European Nucleotide Archive [https://www.ebi.ac.uk/ena/], Sequence Read Archive [https://www.ncbi.nlm.nih.gov/sra/]). Accession numbers for the raw whole-genome sequences are presented in Tables S2-S6. All data, code, and analysis commands used to generate the results and figures presented in this paper are available on figshare (DOI: 10.6084/m9.figshare.31930821) and GitHub (https://github.com/erkison/verticall_paper).

18
LAS3R: A simple, secure, scalable, and robust framework fordeploying lab automation devices

Shah, K. H.; Micklem, G.

2026-04-12 bioengineering 10.64898/2026.04.08.716564 medRxiv
Top 0.1%
1.3%
Show abstract

Laboratory automation can greatly accelerate experiments and data collection, yet building automated systems often requires substantial programming and electronics expertise, and few frameworks are targeted at deploying many devices. We present LAS3R, a low-cost, open-source framework that enables researchers with minimal technical expertise to rapidly prototype, deploy, remotely control, and collect data from multiple custom-built laboratory devices while maintaining strong security and reliability throughout the process--from early prototyping to routine operation. The system is built around a central hub to which multiple lab devices connect. This hub can be set up on a Raspberry Pi (a small, low-cost single-board computer) in under fifteen minutes. In the setup process, code is automatically generated for ESP32 microcontroller boards that control the hardware. Users can choose from a list of preconfigured ESP32 devices, for example a bioreactor, or use a template that provides base code for many common automation tasks, which they can then easily customise using the beginner-friendly Arduino platform. The ESP32 devices connect through a secure Wi-Fi network hosted by the Raspberry Pi that encrypts communication, and ensures only authorised hardware can join, helping safeguard experimental data and institutional networks, even while prototyping. We demonstrate the framework with two applications--a turbidostat bioreactor and a light-level controller--and show that it can simultaneously manage eight devices with 24 sensors. Robustness was evaluated through single-point-of-failure analysis, confirming continued operation during mains power or network interruptions. Comprehensive documentation, aimed at wet lab researchers, enables users to understand, build, and adapt the system, making it both a practical laboratory automation platform, including for those in low-resource settings, and a teaching resource. This paper is intended to be a technical evaluation of the architecture. Those wishing to deploy the system should refer to the online documentation at kavihshah.github.io/LAS3R.

19
SMEW: An interactive multi-scale toolkit for cross-condition and network-based analysis of spatial metabolomics data

Williams, E.; Hulme, H.; Zakirov, A.; Buszta, D.; Hamm, G.; Flint, L.; Franzen, L.; Olsson Lindvall, M.; Stamou, M.; Andersson, P.; Tan, J.; Ling, S.; Mohorianu, I.

2026-04-29 bioinformatics 10.64898/2026.04.27.721059 medRxiv
Top 0.1%
1.2%
Show abstract

Spatial metabolomics, measured through mass spectrometry imaging (MSI), provides high-throughput, spatially resolved information on metabolite distributions within tissues, including endogenous metabolites and exogenous compounds. This offers a direct readout of cellular biochemical activity and phenotypes, not fully captured by transcriptomics or proteomic profiling. However, inferring biologically meaningful patterns from noisy, high-dimensional MSI data, particularly across multiple samples and complex experimental designs, remains challenging, and often requires substantial programming expertise. Here we introduce SMEW (Spatial Metabolomics Enhanced Workflow), a flexible, interactive and shareable Shiny-based platform designed to enable code-free downstream analysis of spatial metabolomics MSI data. SMEW provides a unified environment for hierarchical analysis across bulk-, region- and pixel-level resolutions, allowing comparisons between experimental conditions like disease or treatment groups while highlighting coherent metabolic patterns and linking these patterns to biological pathways. The workflow leverages local spatial covariation to robustly summarise MSI data through dimensionality reduction, clustering and identification of spatially variable metabolites. In addition, metabolite co-localisation and covariation network analysis, together with spatially resolved pathway enrichment facilitate the biological interpretation of cross-condition datasets within a single integrated interface. SMEW is applicable across MSI technologies and mass resolutions, as illustrated through case studies on DESI and MALDI-ToF datasets from lung, liver, and kidney. By complementing existing MSI processing and visualisation tools with an accessible, multi-sample, and biologically interpretable analysis framework, SMEW enables functional, flexible, rigorous and intuitive exploration of spatial metabolomics datasets. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=84 SRC="FIGDIR/small/721059v1_ufig1.gif" ALT="Figure 1"> View larger version (29K): org.highwire.dtl.DTLVardef@1e2abaeorg.highwire.dtl.DTLVardef@753ee9org.highwire.dtl.DTLVardef@1756fc1org.highwire.dtl.DTLVardef@fbedc7_HPS_FORMAT_FIGEXP M_FIG C_FIG Key PointsO_LISMEW provides a flexible, interactive and shareable Shiny-based platform designed to enable code-free downstream analysis of spatial metabolomics MSI data C_LIO_LIThe SMEW framework enables hierarchical analysis at bulk-, region- and pixel levels within a unified framework without relying on extensive programming expertise C_LIO_LIThe pipeline integrates spatially aware clustering, pathway analysis and identification of metabolite co-localisation modules C_LIO_LIThe workflow facilitates flexible comparison of multi-sample experimental conditions through multivariate modelling, differential testing and covariation networks to study treatment- and disease-associated metabolite dynamics C_LIO_LISMEW has been applied to interrogate diverse biological questions, including characterising disease-associated remodelling in a mouse bleomycin model of pulmonary fibrosis, exploring the therapeutic index of antisense oligonucleotides in the liver and assessing metabolic heterogeneity in a small molecule-treated mouse renal tumour model C_LI

20
MetaXtract: Extracting Metadata from Raw Files for FAIR Data Practices and Workflow Optimisation

Lutfi, A.; Chen, Z. A.; Fischer, L.; Rappsilber, J.

2026-03-16 bioinformatics 10.1101/2025.11.12.687968 medRxiv
Top 0.1%
1.1%
Show abstract

Mass spectrometry (MS) experiments generate rich acquisition metadata that are essential for reproducibility, data sharing, and quality control (QC). Because these metadata are typically stored only in vendor-specific formats, they often remain difficult to access. MetaXtract is a lightweight tool that extracts detailed parameters directly from Thermo Fisher raw files and exposes them in structured, tabular formats. By capturing sample information, LC-MS method settings, and scan-level metrics such as retention time, total ion current, and ion injection time, MetaXtract increases transparency and ensures that essential acquisition details accompany published data and results in easy readable form. This supports FAIR data practices by improving the findability, accessibility, interoperability, and reusability of MS datasets after converting them to other formats, thereby increasing the value of deposition in public repositories. The importance of such metadata accessibility was recently highlighted by the crosslinking mass spectrometry community in efforts to advance FAIR data principles, and it extends to MS-based omics approaches more broadly. Importantly, MetaXtract enables search-free, near real-time performance monitoring by relying on acquisition-side signals, providing actionable indicators immediately after data acquisition rather than after database searching. This also caters for laboratory or depository internal streamlined QC and troubleshooting through integration into automated pipelines. By embedding acquisition parameters into routine data handling, MetaXtract strengthens reproducibility, optimises method development, and supports large-scale applications, including machine learning and secondary data analysis. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=195 HEIGHT=200 SRC="FIGDIR/small/687968v2_ufig1.gif" ALT="Figure 1"> View larger version (23K): org.highwire.dtl.DTLVardef@d835e6org.highwire.dtl.DTLVardef@babfaforg.highwire.dtl.DTLVardef@7e9d69org.highwire.dtl.DTLVardef@907993_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIMetadata extraction from Thermo Fisher raw files C_LIO_LIEnhanced findability, accessibility, interoperability, and reusability of deposited data C_LIO_LIIntegration into workflows via GUI and command-line modes C_LIO_LITroubleshooting support by visualizing MS1/MS2 scan details C_LIO_LIIndexed MS1/MS2 peak list export enabling machine learning workflows C_LI AvailabilityMetaXtract is available for free download as open-source software at https://github.com/Rappsilber-Laboratory/MetaXtract, the software is licensed under the Apache-2.0 license.